36 research outputs found

    A cell set for self-timed design using actel FPGAs

    Get PDF
    technical reportAsynchronous or self-timed systems that do not rely on a global clock to keep system components synchronized can offer significant advantages over traditional clocked circuits in a variety of applications. However, these systems require that suitable self-timed circuit primitives are available for building the system. This report describes a cell set designed for building self-timed circuits and systems using Actel field programmable gate arrays (FPGAs). The cells use a two-phase transition signalling protocol for control signals and a bundled protocol for data signals. This library of macro cells is designed to be used with the Workmen tool suite from VIEWlogic and the Action Logic System (ALS) from Actel

    Low latency self-timed flow-through FIFOs

    Get PDF
    Journal ArticleSelf-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in this type of FIFO as the communication required to send new data to the FIFO is local to only the first element of the FIFO. Circuit density can also be high because the control overhead is very small. However, because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This paper describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear pow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and a folded FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output

    Reduced latency self-timed FIFO circuits

    Get PDF
    technical reportSelf-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in FIFOs of this type because new data can be sent to the FIFO after communicating locally with only the first element of the FIFO. Therefore the throughput is independent of the depth of the FIFO. Density can also be high because the control overhead is very small. However, because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This report describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear flow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and an arbited FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output

    The NSR processor

    Get PDF
    Journal ArticleThe NSR (Non-Synchronous RISC) processor is a general-purpose computer structured (IS U collection of self-timed blocks that operate concurrently and communicate over bundled data channels in the style of micropipelines [3, 16]. These blocks correspond to standard synchronous pipeline stages such us Instruction Fetch, Instruction Decode, Execute, Memory and register File, but each operates concurrently as a separate self-timed process. In addition to being internally self-timed, the units are decoupled through self-timed FIFO queues between each of the units which allows U high degree of overlap in instruction execution. Branches, jumps, and memory accesses are also decoupled through the use of additional FIFO queues which can hide the execution latency of these instructions. A prototype implementation of the NSR processor has been constructed using Actel FPGAs (Field Programmable Gate Arrays)

    Editorial asynchronous architecture

    Get PDF
    Journal ArticleAsynchronous design is enjoying a worldwide resurgence of interest following several decades in obscurity. Many of the early computers employed asynchronous design techniques, but since the mid 1970s almost all digital design has been based around the use of a central clock. The clock simplifies most aspects of design and offers methodologies which are straightforward and easy to automate. These benefits have helped digital engineers to take advantage of the ever-expanding resource at their disposal, while keeping design costs under control. Comprehensive CAD systems used pervasively in industry, and targeted specifically at synchronous design styles, are one way of achieving this

    Peephole optimization of asynchronous macromodule networks

    Get PDF
    Journal ArticleAbstract- Most high-level synthesis tools for asynchronous circuits take descriptions in concurrent hardware description languages and generate networks of macromodules or handshake components. In this paper, we propose a peephole optimizer for these networks. Our peephole optimizer first deduces an equivalent blackbox behavior for the network using Dill's tracetheoretic parallel composition operator. It then applies a new procedure called burst-mode reduction to obtain burst-mode machines from the deduced behavior. In a significant number of examples, our optimizer achieves gate-count improvements by a factor of five, and speed (cycle-time) improvements by a factor of two. Burst-mode reduction can be applied to any macromodule network that is delay insensitive as well as deterministic. A significant number of asynchronous circuits, especially those generated by asynchronous high-level synthesis tools, fall into this class, thus making our procedure widely applicable

    Estimating performance of an ray- tracing ASIC design

    Get PDF
    Journal ArticleRecursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light transport in a scene. Algorithmic improvements and FPGA-based hardware implementations of ray tracing have demonstrated realtime performance but hardware that achieves performance levels comparable to commodity rasterization graphics chips is still not available. This paper describes the architecture and ASIC implementations of the DRPU design (Dynamic Ray Processing Unit) that closes this performance gap. The DRPU supports fully programmable shading and most kinds of dynamic scenes and thus provides similar capabilities as current GPUs. It achieves high efficiency due to SIMD processing of floating point vectors, massive multithreading, synchronous execution of packets of threads, and careful management of caches for scene data. To support dynamic scenes B-KD trees are used as spatial index structures that are processed by a custom traversal and intersection unit and modified by an Update Processor on scene changes

    Fred: an architecture for a self-timed decoupled computer

    Get PDF
    Journal ArticleDecoupled computer architectures provide an effective means of exploiting instruction level parallelism. Selftimed micropipeline systems are inherently decoupled due to the elastic nature of the basic FIFO structure, and may be ideally suited for constructing decoupled computer architectures. Fred is a self-timed decoupled, pipelined computer architecture based on micropipelines. We present the architecture of Fred, with specific details on a micropipelined implementation that includes support for multiple functional units and out-of-order instruction completion due to the self-timed decoupling

    Performance analysis and optimization of asynchronous circuits

    Get PDF
    Journal ArticleAsynchronous/Self-timed circuits are beginning to attract renewed attention as promising means of dealing with the complexity of modern VZSI designs. Very few analysis techniques or tools are available for estimating their performance. In this paper we adapt the theory of Generalized Timed Petri-nets (GTPN) for analyzing and comparing asynchronous circuits ranging from purely control-oriented circuits to those with data dependent control. Experiments with the GTPN analyzer are found to track the observed performance of actual asynchronous circuits, thereby offering empirical evidence toward the soundness of the modeling approach

    A partial scan methodology for testing self-timed circuits

    Get PDF
    technical reportThis paper presents a partial scan method for testing control sections of macromodule based self-timed circuits for stuck-at faults. In comparison with other proposed test methods for self-timed circuits, this technique offers better fault coverage than methods using self-checking techniques, and requires fewer storage elements to be made scannable than full scan approaches with similar fault coverage. A new method is proposed to test the sequential network in this partial scan environment. Experimental data is presented to show that high fault coverage is possible using this method with only a subset of storage elements being made scannable
    corecore